23 research outputs found

    Improving Language Modelling with Noise-contrastive estimation

    Full text link
    Neural language models do not scale well when the vocabulary is large. Noise-contrastive estimation (NCE) is a sampling-based method that allows for fast learning with large vocabularies. Although NCE has shown promising performance in neural machine translation, it was considered to be an unsuccessful approach for language modelling. A sufficient investigation of the hyperparameters in the NCE-based neural language models was also missing. In this paper, we showed that NCE can be a successful approach in neural language modelling when the hyperparameters of a neural network are tuned appropriately. We introduced the 'search-then-converge' learning rate schedule for NCE and designed a heuristic that specifies how to use this schedule. The impact of the other important hyperparameters, such as the dropout rate and the weight initialisation range, was also demonstrated. We showed that appropriate tuning of NCE-based neural language models outperforms the state-of-the-art single-model methods on a popular benchmark

    A Spectral Method that Worked Well in the SPiCe'16 Competition

    Get PDF
    We present methods used in our submission to the Sequence Prediction ChallengE (SPiCe’16) 1 . The two methods used to solve the competition tasks were spectral learning and a count based method. Spectral learning led to better results on most of the problems

    An Improved Crowdsourcing Based Evaluation Technique for Word Embedding Methods

    Get PDF
    In this proposal track paper, we have presented a crowdsourcing-based word embedding evaluation technique that will be more reliable and linguistically justified. The method is designed for intrinsic evaluation and extends the approach proposed in (Schnabel et al., 2015). Our improved evaluation technique captures word relatedness based on the word context

    Challenges of Enforcing Regulations in Artificial Intelligence Act --- Analyzing Quantity Requirement in Data and Data Governance

    Get PDF
    To make Artificial Intelligence (AI) systems and services accountable and regulated in the European Union market, in April 2021, the European Union Parliament published a proposal `Laying Down Harmonised Rules on Artificial Intelligence (Artificial Intelligence Act)', widely known as Artificial Intelligence Act (AI Act). Since then, many concerns have been raised in terms of compliance and whether the regulations are enforceable. However, to the best of our knowledge, none of them provided an explicit technical analysis of the challenges in enforcing the regulation. Among 85 Articles in the AI Act, we emphasize on the Article 10, the central regulatory requirement for data and data governance. In this paper, we have analyzed a specific requirement, the data quantity, to show the challenges of enforcing this requirement in a principled way. In our analysis, we have used deep learning modeling and machine learning generalization theory

    Improving Language Modelling with Noise Contrastive Estimation

    Get PDF
    Neural language models do not scale well when the vocabulary is large. Noise contrastive estimation (NCE) is a sampling-based method that allows for fast learning with large vocabularies. Although NCE has shown promising performance in neural machine translation, its full potential has not been demonstrated in the language modelling literature. A sufficient investigation of the hyperparameters in the NCE-based neural language models was clearly missing. In this paper, we showed that NCE can be a very successful approach in neural language modelling when the hyperparameters of a neural network are tuned appropriately. We introduced the `search-then-converge' learning rate schedule for NCE and designed a heuristic that specifies how to use this schedule. The impact of the other important hyperparameters, such as the dropout rate and the weight initialisation range, was also demonstrated. Using a popular benchmark, we showed that appropriate tuning of NCE in neural language models outperforms the state-of-the-art single-model methods based on the standard LSTM recurrent neural networks

    Relating RNN layers with the spectral WFA ranks in sequence modelling

    Get PDF
    We analyse Recurrent Neural Networks (RNNs) to understand the significance of multiple LSTM layers. We argue that the Weighted Finite-state Automata (WFA) trained using a spectral learning algorithm are helpful to analyse RNNs. Our results suggest that multiple LSTM layers in RNNs help learning distributed hidden states, but have a smaller impact on the ability to learn long-term dependencies. The analysis is based on the empirical results, however relevant theory (whenever possible) was discussed to justify and support our conclusions

    Capturing Changes in Mood Over Time in Longitudinal Data Using Ensemble Methodologies

    Get PDF
    This paper presents the system description of team BLUE for Task A of the CLPsych 2022 Shared Task on identifying changes in mood and behaviour in longitudinal textual data. These moments of change are signals that can be used to screen and prevent suicide attempts. To detect these changes, we experimented with several text representation methods, such as TF-IDF, sentence embeddings, emotion-informed embeddings and several classical machine learning classifiers. We chose to submit three runs of ensemble systems based on maximum voting on the predictions from the best performing models. Of the nine participating teams in Task A, our team ranked second in the Precision-oriented Coverage-based Evaluation, with a score of 0.499. Our best system was an ensemble of Support Vector Machine, Logistic Regression, and Adaptive Boosting classifiers using emotion-informed embeddings as input representation that can model both the linguistic and emotional information found in users’ posts

    Event Causality Identification with Causal News Corpus -- Shared Task 3, CASE 2022

    Get PDF
    The Event Causality Identification Shared Task of CASE 2022 involved two subtasks working on the Causal News Corpus. Subtask 1 required participants to predict if a sentence contains a causal relation or not. This is a supervised binary classification task. Subtask 2 required participants to identify the Cause, Effect and Signal spans per causal sentence. This could be seen as a supervised sequence labeling task. For both subtasks, participants uploaded their predictions for a held-out test set, and ranking was done based on binary F1 and macro F1 scores for Subtask 1 and 2, respectively. This paper summarizes the work of the 17 teams that submitted their results to our competition and 12 system description papers that were received. The best F1 scores achieved for Subtask 1 and 2 were 86.19% and 54.15%, respectively. All the top-performing approaches involved pre-trained language models fine-tuned to the targeted task. We further discuss these approaches and analyze errors across participants' systems in this paper.Comment: Accepted to the 5th Workshop on Challenges and Applications of Automated Extraction of Socio-political Events from Text (CASE 2022
    corecore